Szeged
6fee03d84375a159ecd3769ebbacae83-Supplemental-Conference.pdf
Convergence of stochastic gradient descent for non-smooth problems is a known result. For completeness, wereproduce and adapt ausual proof toour setting. Let us denote byF the class of functions fromX toY we are going to work with. Assumption 1 states that we have a well-specified modelF to estimate the median,i.e. Let us begin by controlling the estimation error.
Adversarial Risk and Robustness: General Definitions and Implications for the Uniform Distribution
Dimitrios Diochnos, Saeed Mahloujifar, Mohammad Mahmoody
As the current literature contains multiple definitions of a dversarial risk and robustness, we start by giving a taxonomy for these definitions based on their direct goals; we identify one of them as the one guaranteeing miscla ssification by pushing the instances to the error region . We then study some classic algorithms for learning monotone conjunctions and compare their adversar ial robustness under different definitions by attacking the hypotheses using ins tances drawn from the uniform distribution. We observe that sometimes these defin itions lead to significantly different bounds. Thus, this study advocates for the use of the error-r egion definition, even though other definitions, in other contexts with context-dependent assumptions, may coincide with the error-region definition .
Temporal Anchoring in Deepening Embedding Spaces: Event-Indexed Projections, Drift, Convergence, and an Internal Computational Architecture
Alpay, Faruk, Kilictas, Bugra, Alakkad, Hamdi
We develop an operator-theoretic framework for temporal anchoring in embedding spaces, modeled as drift maps interleaved with event-indexed blocks culminating in affine projections. We provide complete proofs for a variable-block contraction lemma (products of Lipschitz factors), a drift--projection convergence theorem with explicit uniform-gap envelopes, and ontological convergence under nested affine anchors with a robustness variant. We formalize an internal Manuscript Computer (MC) whose computations are defined purely by these operators and prove a rigorous finite-run equivalence theorem (with perturbation bounds). For attention layers, we give a self-contained proof that softmax is $1/2$-Lipschitz in $\ell_2$ and derive sufficient layer-contraction conditions (orthogonal/non-orthogonal heads). All floats are placed exactly where written; the manuscript uses only in-paper pseudocode and appendix figures.